1 Analysis Cohort

In this analysis, there are 150 subjects and 6108 observations. Mean (range) of follow-up time per patient was 10 (0, 14) years. Mean (range) number of observations was 41 (1, 134). The same number of patients and observations were utilized in each model below. We assume for each model that the data are missing at random, except for the joint longitudinal-survival model. The dependent variable is GLI FEV1 (% predicted) named “y” in the output below. The time scale is patients’ age at visit. Age is expressed in years. Age at CF diagnosis (in years) was included as a covariate in each model. Below are summary statistics for each variable used in the model. Once a patient was recorded as having a lung transplant, his or her data were censored from the model.

##        y                Age        Age_Diag_per_patient
##  Min.   :  8.479   Min.   : 6.00   Min.   : 0.000      
##  1st Qu.: 46.243   1st Qu.:13.42   1st Qu.: 0.100      
##  Median : 69.822   Median :19.76   Median : 0.300      
##  Mean   : 69.122   Mean   :21.16   Mean   : 2.004      
##  3rd Qu.: 91.778   3rd Qu.:27.20   3rd Qu.: 1.700      
##  Max.   :134.829   Max.   :61.35   Max.   :17.500

The Kaplan-Meier (KM) survival curve for those who died or received a lung transplant is plotted below showing the survival probability over age at visit (top graphic) and the number still alive/without lung transplant (bottom graphic).

Here we plot the frequency of subjects who died or received lung transplant over age from 6 to 30 years old. The event rate is around 0.193 or 19.3%. As age increases, the frequency of death/lung transplant increases. Since there are more subjects who dropped out due to death or receiving a lung transplant, there is the potential for a survivor bias, which we investigate in a later section.

In the following report, we investigate three commonly used structures in CF FEV1 analysis. These include the linear mixed-effects model (LME), Generalized Estimating Equations (GEE) and joint modelling (JM). For the LME models, we include different combinations of random effects, splines and correlation structures as used in published CF studies (Section 2). We examine published GEE models combined with additional features for linearity/non-linearity (Section 3). We fit a joint model with the linear/non-linear structure from the LME models in Section 2.

2 Linear Mixed-Effects Model (LME)

2.1 LME Model with Random Intercept

2.1.1 Assumptions

Average evolution: A linear evolution is assumed over time.

Patient-specific evolution: A linear evolution is assumed over time. We assume that each patient may start from a different time point but all patients have the same evolution over time.

2.1.2 Model Estimation

## [1] "AIC =  47483"
## [1] "BIC =  47516"
Variance Components
StdDev
(Intercept) 21.45834
Residual 11.11963
Value Std.Error DF t-value p-value
(Intercept) 104.477 2.140 5957 48.812 0.000
Age -1.695 0.041 5957 -41.768 0.000
Age_Diag 1.168 0.473 148 2.469 0.015

As the coefficient estimate for ‘Age’ is around -1.695 % predicted/year, this means for one year increase in age, FEV1 is decreasing by 1.695 (unit % predicted).

2.1.3 Model Inference

We can calculate the population-level evolution (left) and rate of change (right) over time. As we have assumed a linear evolution, the left figure shows a straight line, corresponding to a linear estimated fit. The rate of change is calculated by taking first derivative of the model equation over time. This model estimates the rate of change in FEV1 as constant over different ages as -1.695.

2.1.4 Model Diagnostics

We examined standardized residuals in terms of randomness, spread and nature of distribution.

  • The residuals vs fitted plot (upper left): XXXXX.
  • The residuals vs age (upper right): XXXXX.
  • The quantile-quantile (QQ) plot (lower left): XXXXX.
  • The density of residuals (lower right): XXXXX..

2.2 LME Model with Random Intercept and Slope

2.2.1 Assumptions

Average evolution: A linear evolution is assumed over time.

Patient-specific evolution: A linear evolution is assumed over time. We assume that each patient may start from a different time point but all patients have different evolution over time.

2.2.2 Model Estimation

## [1] "AIC =  45501"
## [1] "BIC =  45548"
Variance Components
StdDev
(Intercept) 38.266175
Age 1.884613
Residual 9.083908
Value Std.Error DF t-value p-value
(Intercept) 109.082 3.517 5957 31.020 0.000
Age -1.807 0.169 5957 -10.685 0.000
Age_Diag -0.588 0.594 148 -0.991 0.323

As the estimation of coefficient for ‘Age’ is around -1.807 % predicted/year, this means for one year increase in age, FEV1 is decreasing by 1.807 (unit % predicted).

2.2.3 Model Inference

2.2.4 Model Diagnostics

2.3 LME Model with 5-knot Natural Spline

2.3.1 Assumptions

Average evolution: A non-linear evolution (assuming cubic natural splines) is assumed over time.

Patient-specific evolution: A non-linear evolution (assuming cubic natural splines) is assumed over time. A diagonal variance-covariance matrix for the random effects are assumed. This means that the random effects are not correlated.

The knots, listed below, were chosen according to a recursive algorithm (reference: Spline Regression with Automatic Knot Selection, Vivien Goepp (2018)). There are 6 knots in total selected for the entire dataset (age: 6 years and older).

## [1] "Knots location (at age): 9.8, 13.5, 17.3, 21.4, 31.3"

2.3.2 Model Estimation

## [1] "AIC =  44944"
## [1] "BIC =  45051"
Variance Components
StdDev
(Intercept) 17.395092
ns1 17.596203
ns2 21.803414
ns3 24.988258
ns4 63.200024
ns5 2.495193
ns6 52.892715
Residual 8.409801
Value Std.Error DF t-value p-value
(Intercept) 93.443 2.320 5952 40.275 0.000
ns1 -13.259 2.588 5952 -5.122 0.000
ns2 -15.831 3.102 5952 -5.104 0.000
ns3 -35.268 3.427 5952 -10.291 0.000
ns4 -37.766 8.209 5952 -4.601 0.000
ns5 -100.247 10.892 5952 -9.204 0.000
ns6 -147.072 19.236 5952 -7.646 0.000
Age_Diag 0.869 0.552 148 1.575 0.117

2.3.3 Model Inference

2.3.4 Model Diagnostics

2.4 Linear Random Intercepts with Exponential Correlation

2.4.1 Assumptions

Average evolution: A linear mean evolution is assumed over time.

Patient-specific evolution: A linear evolution is assumed over time. We assume that each patient may start from a different time point but all patients have the same evolution over time. Correlation structure: exponential correlation function. This means that, for each subject, the correlation of FEV1 exponentially decays to zero with the distance between times (ages).

2.4.2 Model Estimation

## [1] "AIC =  44503"
## [1] "BIC =  44550"
Variance Components
StdDev
(Intercept) 6.232874
Residual 23.428058
## Correlation Structure: Exponential spatial correlation
##  Formula: ~Age | eDWID 
##  Parameter estimate(s):
##     range    nugget 
## 26.188394  0.101597
Value Std.Error DF t-value p-value
(Intercept) 100.426 3.022 5957 33.229 0.000
Age -1.467 0.121 5957 -12.134 0.000
Age_Diag 1.029 0.481 148 2.138 0.034

As the estimation of the coefficient for ‘Age’ is around -1.467 % predicted/year, this means for one year increase in age, FEV1 is decreasing by 1.467 (unit % predicted).

2.4.3 Model Inference

2.4.4 Model Diagnostics

2.5 Spline Random Intercept with Exponential Correlation

2.5.1 Assumptions

Average evolution: A non-linear evolution (assuming cubic natural splines) is assumed over time.

Patient-specific evolution: A linear evolution is assumed over time and each patient may start at different time point

Correlation structure: exponential correlation function. This means that, for each subject, the correlation of FEV1 exponentially decays to zero with the distance between times (age).

2.5.2 Model Estimation

## [1] "Knots location (at age): 9.8, 13.5, 17.3, 21.4, 31.3"
## [1] "AIC =  44494"
## [1] "BIC =  44575"
Variance Components
StdDev
(Intercept) 9.085261
Residual 22.030623
## Correlation Structure: Exponential spatial correlation
##  Formula: ~Age | eDWID 
##  Parameter estimate(s):
##      range     nugget 
## 23.1608452  0.1150344
Value Std.Error DF t-value p-value
(Intercept) 91.998 2.949 5952 31.191 0.000
ns1 -11.480 2.860 5952 -4.014 0.000
ns2 -14.339 3.161 5952 -4.536 0.000
ns3 -33.946 3.543 5952 -9.580 0.000
ns4 -39.890 7.713 5952 -5.172 0.000
ns5 -73.488 14.425 5952 -5.095 0.000
ns6 -97.304 28.066 5952 -3.467 0.001
Age_Diag 0.874 0.478 148 1.827 0.070

2.5.3 Model Inference

2.5.4 Model Diagnostics

3 Linear Generalized Estimating Equations Model (GEE)

Compared with LME models, GEE models allow for weaker distributional assumptions and are more robust to covariance misspecification. In this analysis, the average evolution (fixed-effects) structure in the GEE model was selected based on the performance of all LME models. This GEE model is fitted with a linear structure to model the mean evolution and accounts for longitudinal correlation using the sandwich estimator.

3.1 Assumptions

Average evolution: A linear evolution (assuming natural cubic splines) is used over time.

Patient-specific evolution: A linear evolution is assumed over time and each patient may start at different time point.

Correlation structure: independence (working) correlation function.

3.2 Model Estimation

Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 94.730 4.065 543.030 0.000
Age -1.291 0.173 55.437 0.000
Age_Diag 1.014 0.610 2.759 0.097
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 580.781 45.699

3.3 Model Inference

3.4 Model Diagnostics

4 Non-linear Generalized Estimating Equations Model (GEE)

In this analysis, the GEE mode is fitted with a spline structure to model the mean evolution and includes an independence correlation.

4.1 Assumptions

Average evolution: A non-linear evolution (assuming cubic natural splines) is used over time.

Patient-specific evolution: A linear evolution is assumed over time and each patient may start at different time point.

Correlation structure: independence correlation function (sandwich estimator used to account for longitudinal correlation).

4.2 Model Estimation

Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 94.684 4.483 446.164 0.000
ns1 -15.931 5.057 9.926 0.002
ns2 -20.351 7.004 8.444 0.004
ns3 -44.588 7.271 37.602 0.000
ns4 -22.620 15.467 2.139 0.144
ns5 -82.724 24.720 11.198 0.001
ns6 -113.967 50.821 5.029 0.025
Age_Diag 0.873 0.565 2.383 0.123
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 550.579 44.588

4.3 Model Inference

4.4 Model Diagnostics

5 Linear Joint Model of Longitudinal FEV1 (% predicted) and Survival (JM)

Missing data due to death or lung transplant is accounted for by jointly modeling the FEV1 process with the survival process using a Weibull model for survival. We use a Weibull baseline risk function to fit the baseline hazard in the survival sub-model. This joint model was implemented using the shared parameter framework. The shared parameters here corresponded to connecting the survival outcome with the underlying value of FEV1 and rate of change in FEV1 over time.

5.1 Assumptions

We assume that FEV1 and survival are associated over time (age). To fit the longitudinal sub-model for FEV1, we apply the LME model with linear assumption; for survival, we use the Weibull survival model.

Average evolution of FEV1: A linear evolution is assumed over time.

Patient-specific evolution of FEV1: A linear evolution is assumed over time and each patient may start at different time point (and have the same evolution).

Survival Baseline hazard: A Weibull baseline risk hazard is assumed over time at baseline.

Association: We assume the underlying value of FEV1 and rate of change over time to be associated with survival.

5.2 Model Estimation and Interpretation

## [1] "AIC =  45677"
## [1] "BIC =  45713"

The variance components for FEV1 include the standard deviations of the random intercept as “(Intercept)”, slope as “Age” and residual as “Residual” reported below.

Variance Components
StdDev
(Intercept) 37.905
Age 1.924
Residual 9.085
Longitudinal Process
Value Std.Err z-value p-value
(Intercept) 109.532 2.986 36.681 0.000
Age -1.835 0.090 -20.303 0.000
Age_Diag -0.565 0.502 -1.124 0.261
Event Process
Value p-value Hazard Ratio Lower Upper
(Intercept) -7.867 0.011 0.000 0.000 0.161
Age_Diag -0.163 0.054 0.850 0.720 1.003
Assoct -0.131 0.000 0.877 0.835 0.922
Assoct.s -0.426 0.000 0.653 0.530 0.805
log(shape) 1.148 0.000 3.151 1.957 5.075

5.3 Model Inference

5.4 Model Diagnostics

## Warning in rm(list = ".Random.seed", envir = globalenv()): object
## '.Random.seed' not found

6 Non-linear Joint Model of Longitudinal FEV1(% predicted) and Survival (JM)

6.1 Assumptions

Average evolution of FEV1: A non-linear evolution by cubic natural spline is assumed over time.

Patient-specific evolution of FEV1: A linear evolution is assumed over time and each patient may start at different time point (and have the same evolution).

Survival Baseline hazard: A Weibull baseline risk hazard is assumed over time at baseline.

Association: We assume the underlying value of FEV1 and rate of change over time to be associated with survival.

6.2 Model Estimation and Interpretation

## [1] "AIC =  45654"
## [1] "BIC =  45705"
Variance Components
StdDev
(Intercept) 38.702
Age 1.978
Residual 9.056
Longitudinal Process
Value Std.Err z-value p-value
(Intercept) 91.038 3.242 28.077 0.00
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))1 -8.052 1.953 -4.123 0.00
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))2 -12.089 2.566 -4.711 0.00
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))3 -27.063 3.279 -8.253 0.00
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))4 -51.997 4.220 -12.321 0.00
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))5 -94.355 5.710 -16.523 0.00
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))6 -126.342 7.811 -16.175 0.00
Age_Diag -0.770 0.496 -1.554 0.12
Event Process
Value p-value Hazard Ratio Lower Upper
(Intercept) -6.241 0.011 0.002 0.000 0.240
Age_Diag -0.176 0.045 0.838 0.706 0.996
Assoct -0.133 0.000 0.876 0.836 0.918
Assoct.s -0.420 0.000 0.657 0.540 0.800
log(shape) 1.000 0.000 2.720 1.728 4.279

6.3 Model Inference

6.4 Model Diagnostics

## Warning in rm(list = ".Random.seed", envir = globalenv()): object
## '.Random.seed' not found

7 Plot Population-level FEV1 (% predicted) Estimation for All Models

Model Name Abbreviation
Name Specification
Int Linear evolution with random intercept
Int + Slope Linear evolution with random intercept and slope
Spline Non-linear evolution by using spline as both average evolution and random effect
Int + Corr Linear evolution with random intercept and exponential correlation structure
Spline + Corr Non-linear evolution with spline as average evolution, random intercept and exponential correlation structure
Lin_GEE Linear evolution with independence correlation structure
Nonlin_GEE Non-linear evolution by using spline with independence correlation structure
Lin_JM Linear evolution with jointly model FEV and survival
Nonlin_JM Non-linear evolution with jointly model FEV and survival

To check more closely, we zoom in the result for age 6 to 30.